target encoder
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
Audio-JEPA: Joint-Embedding Predictive Architecture for Audio Representation Learning
Tuncay, Ludovic, Labbé, Etienne, Benetos, Emmanouil, Pellegrini, Thomas
Self-Supervised Learning ( SSL) has revolutionized representation learning for speech and audio, enabling models to learn from unlabeled data and excel in diverse downstream tasks [ 1, 2, 3, 4 ] . Early SSL approaches for audio, such as contrastive predictive coding and wav2vec 2.0, learned latent speech representations by masking the input and solving a contrastive task over latent codes [ 5 ] . Follow-up methods like HuBERT [ 1 ] introduced offline clustering to generate pseudo-labels for masked audio segments and WavLM [ 6 ] applied data augmentation and denoising to improve robustness in speech representation learning. More recently, latent prediction approaches have gained traction: data2vec [ 7 ] and its efficient successor data2vec 2.0 [ 8 ] employ a teacher-student framework to predict contextualized latent representations of the input, achieving strong results across vision, speech, and language tasks. In the audio domain, Niizumi et al. introduced Masked Modeling Duo (M2D) [ 4 ], which uses two networks (online and momentum encoder) to predict masked patch embeddings and attained state-of-the-art results on numerous audio benchmarks. In computer vision, a new paradigm called Joint-Embedding Predictive Architecture (JEP A) [ 9, 10, 11 ] has been proposed to predict hidden content in a high-level latent space instead of pixel space.
- North America > United States (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > France (0.04)
BESA: Boosting Encoder Stealing Attack with Perturbation Recovery
Ren, Xuhao, Liang, Haotian, Wang, Yajie, Zhang, Chuan, Xiong, Zehui, Zhu, Liehuang
--T o boost the encoder stealing attack under the perturbation-based defense that hinders the attack performance, we propose a boosting encoder stealing attack with perturbation recovery named BESA. It aims to overcome perturbation-based defenses. The core of BESA consists of two modules: perturbation detection and perturbation recovery, which can be combined with canonical encoder stealing attacks. The perturbation detection module utilizes the feature vectors obtained from the target encoder to infer the defense mechanism employed by the service provider . Once the defense mechanism is detected, the perturbation recovery module leverages the well-designed generative model to restore a clean feature vector from the perturbed one. Through extensive evaluations based on various datasets, we demonstrate that BESA significantly enhances the surrogate encoder accuracy of existing encoder stealing attacks by up to 24.63% when facing state-of-the-art defenses and combinations of multiple defenses. Pre-trained encoders are extensively utilized across various domains in real-world scenarios [1]. However, training well-performing pre-trained encoders is a time-consuming, resource-intensive, and costly process [2]. Hence, encoder owners are highly motivated to safeguard the privacy of their pre-trained encoders. Unfortunately, recent works have shown that pre-trained encoders are susceptible to encoder stealing attacks [3]. These attacks allow an attacker to create a surrogate encoder that closely mimics the functionality of a targeted encoder by simply querying it through the APIs. The consequences of such attacks can be quite severe.
- Asia > China > Beijing > Beijing (0.05)
- Asia > Singapore (0.05)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (3 more...)
Building Bridges between Regression, Clustering, and Classification
Stewart, Lawrence, Bach, Francis, Berthet, Quentin
Regression, the task of predicting a continuous scalar target y based on some features x is one of the most fundamental tasks in machine learning and statistics. It has been observed and theoretically analyzed that the classical approach, meansquared error minimization, can lead to suboptimal results when training neural networks. In this work, we propose a new method to improve the training of these models on regression tasks, with continuous scalar targets. Our method is based on casting this task in a different fashion, using a target encoder, and a prediction decoder, inspired by approaches in classification and clustering. We showcase the performance of our method on a wide range of real-world datasets.
An In-Depth Analysis of Adversarial Discriminative Domain Adaptation for Digit Classification
Choi, Eugene, Rodriguez, Julian, Young, Edmund
Domain adaptation is an active area of research driven by the growing demand for robust machine learning models that perform well on real-world data. Adversarial learning for deep neural networks (DNNs) has emerged as a promising approach to improving generalization ability, particularly for image classification. In this paper, we implement a specific adversarial learning technique known as Adversarial Discriminative Domain Adaptation (ADDA) and replicate digit classification experiments from the original ADDA paper. We extend their findings by examining a broader range of domain shifts and provide a detailed analysis of in-domain classification accuracy post-ADDA. Our results demonstrate that ADDA significantly improves accuracy across certain domain shifts with minimal impact on in-domain performance. Furthermore, we provide qualitative analysis and propose potential explanations for ADDA's limitations in less successful domain shifts. Code is at https://github.com/eugenechoi2004/COS429_FINAL .